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The equilibrium free energy landscape of off-lattice model heteropolymers as a function of an 
internal coordinate, namely the end-to-end distance, is reconstructed from out-of-equilibrium steered 
molecular dynamics data. This task is accomplished via two independent methods: by employing 
an extended version of the Jarzynski equality (EJE) and the inherent structure (IS) formalism. A 
comparison of the free energies estimated with these two schemes with equilibrium results obtained 
via the umbrella sampling technique reveals a good quantitative agreement among all the approaches 
in a range of temperatures around the "folding transition" for the two examined sequences. In 
particular, for the sequence with good foldability properties, the mechanically induced structural 
transitions can be related to thermodynamical aspects of folding. Moreover, for the same sequence 
Oh, the knowledge of the landscape profile allows for a good estimation of the life times of the native 

configuration for temperatures ranging from the folding to the collapse temperature. For the random 
sequence, mechanical and thermal unfolding appear to follow different paths along the landscape. 
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PACS numbers: 87.15.Aa,82.37.Rs,05.90.+m 



I. INTRODUCTION 



Several states of matter are characterized by a non trivial free energy landscape (FEL), which can be at the origin 
of peculiar structural and dynamical features. Supercooled liquids, glasses, atomic clusters and biomolecules [l[ are 
typical examples of systems whose thermodynamical behavior can be traced back to the intricate topological properties 
of the underlying FEL. The pioneering work by Stillinger and Weber on inherent structures (ISs) of liquids [2fl revealed 
the importance of investigating the stationary points of the potential energy surface (PES) for characterizing their 
dynamical and thermodynamical properties. Similar approaches have been proposed and successfully applied, in 
glasses Q and supercooled liquids Q, to the identification of the structural-arrest temperature. This temperature 
marks a topological transition from a dynamics evolving in a landscape dominated by minima to one where unstable 
O , saddles play a major role 0, @. 

More recently, this kind of analysis has been applied to the study of protein models [E B S B H, E3> EH, Ell • In 
, particular, several studies have been devoted to the reconstruction of the PES and of the FEL topology in terms of 
■ graphs (at various levels of coarse graining) connecting the folded states to the unfolded structures 0, H, 0]. The 
t— I | knowledge of the graph structure connecting the various metastable states and of the probability transitions among 
them allows for a reconstruction of the folding dynamics in terms of a master equation [13| . Moreover, detailed analysis 
of the thermodynamical and dynamical features, characteristics of proteins, have been quite recently carried out in 
terms of ISs [lfl EH, EH • These analysis suggest that the folding process of a protein towards its native configuration 
depends crucially on the structure and topological properties of its (free) energy landscape. Confirming somehow 
the conjecture that the FEL of a protein has a funnel-like shape: the native configuration being located inside the 
so-called native valley at the bottom of the funnel itself [IH . 

On the other hand, mechanical unfolding of single biomolecules represents a powerful technique to extract infor- 
mation on their internal structure as well as on their unfolding and refolding pathways EH E3, EH EH • However, 
mechanical unfolding of biomolecules is an out-of-equilibrium process: unfolding events occur on time scales much 
shorter than the typical relaxation time of the molecule towards equilibrium. Nonetheless, by using the equality intro- 
duced by Jarzynski [2Cj . the free energy of mechanically manipulated biomolecules can be recovered as a function of 
an externally controlled parameter [2lf, [22J. Moreover, an extended version of the Jarzynski equality (EJE) has been 
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proposed in order to estimate the equilibrium free energy landscape in absence of applied forces as a function of an 
internal coordinate of the system (usually, the end-to-end distance £) [HI, HH HB, [26| . Quite recently this approach has 
been successfully applied to data obtained from nanomanipulation of titin 127 domain with Atomic Force Microscopy 
(AFM) (27I |28| and from steered molecular dynamics simulations of a mesoscopic off-lattice protein mod el [291. The 
analysis reported in [29[ was devoted to a single sequence previously identified as a reasonably fast folder [30L l3ll] and 
it was essentially performed at an unique temperature. 

As an extension of the analysis performed in Ref. 29], in the present paper we reconstruct, for two different 
sequences with bad and good folding properties, the equilibrium FEL as a function of the end-to-end distance ( in 
two distinct ways: namely, by employing the EJE approach and the IS distributions. We will show that specific 
features of the landscapes characteristic of a protein, i.e. a good folder, can be singled out from a comparison of 
the two approaches. Furthermore, the different unfolding structural transitions can be associated to the detachment 
of specific strands of the examined heteropolymers. In particular, the investigation of the IS distributions allows us 
to give an estimate of the energetic and entropic barriers separating the native state from the completely stretched 
configuration. Moreover, for the good folder the temperature dependence of the free energy barrier heights and the 
unfolding times can be related. 

An important aspect to clarify is the relationship between the thermal and mechanical unfolding pathways of 
proteins: experimental [32| as well as numerical works [33l [34j seem to suggest that these paths are indeed different. 
However, there are indications that the thermal paths can be recovered also via the manipulation procedure in the 
limit of very low pulling velocities This seems to be in agreement with our findings for the good folder, which 
indicate that the observed structural transitions, induced by mechanical unfolding, can be put in direct relationship 
with the thermal transitions usually identified for the folding/unfolding process. 

The paper is organized as follows, Sect. [Til is devoted to the introduction of the employed model and sequences, as 
well as of the simulation protocols. In Sect. Illll it is explained how to combine the umbrella sampling technique [35[ 
with the weighted histogram analysis method [361 ] in order to recover the equilibrium free energy profile as a function 
of an internal coordinate of the system. The inherent structure formalism and the extended Jarzynski equality are 
briefly illustrated in Sect. IIVI and Sect. El respectively. The thermodynamical properties of the studied sequences are 
reported in Sect. [VTJ While Sect. EH (resp. Sect. IVTTT|> is devoted to the free energy landscape reconstruction in 
terms of the extended Jarzinsky equality (resp. inherent structure approach). In Sect. [VTTT1 the two methods are also 
compared and discussed. Finally, the results are summarized in Sect. ITXl 



II. MODEL AND SIMULATION PROTOCOL 



A. The model 



The model studied in this paper is a modified version of the 3d off-lattice model introduced by Honeycutt-Thirumalai 
[37l ] and successively gen eralized by Berry et al. to include a harmonic interaction between next-neighbouring beads 
instead of rigid bonds [381] . T his model has been widely studied in the context of thermally driven folding and unfolding 
0, HH, H3, HH, [37], [H, [3^, H3] and only more recently for what concerns mechanical folding and refolding [4]], [HJ . The 
model consists of a chain of L point-like monomers mimicking the residues of a polypeptidic chain. For the sake of 
simplicity, only three types of residues are considered: hydrophobic (B), polar (P) and neutral (N) ones. 

The intramolecular potential is composed of four terms: a stiff nearest-neighbour harmonic potential, Vi, intended 
to maintain the bond distance almost constant, a three-body interaction V2, which accounts for the energy associated 
to bond angles, a four-body interaction V3 corresponding to the dihedral angle potential, and a long-range Lennard- 
Jones (LJ) interaction, V4, acting on all pairs i, j such that \i — j\ > 2, namely 

Vi(r»,i+i) = a(r hi+ i - r ) 2 , (1) 
V 2 (B l ) = Acos(^) + Scos(20 l )-1 / o, (2) 
V 3 (^,e z ,6 l+1 ) = C i [l-S(6 i )S(6 i +i)coa(<p i ))]+ A[l-S(W0i+i)cos(3¥>())], (3) 



V4(n,j) = -12 - 5r ( 4 ) 




Here, r^j is the distance between the i-th and the j-th monomer, 8i and <p+ are the bond and dihedral angles at 
the z-th monomer, respectively. The parameters a = 50 and ro = 1 (both expressed in adimensional units) fix the 
strength of the harmonic force and the equilibrium distance between subsequent monomers (which, in real proteins, 
is of the order of a few A). The value of a is chosen to ensure a value for V± much larger than the other terms of 
potential in order to reproduce the stiffness of the protein backbone. The expression for the bond-angle potential 
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FIG. 1: (Color online) Dihedral angle potential, V-j, when two or more beads among the four defining ip are neutral (red dashed 
curve), and in all the other cases (blue solid curve). We fixed Sh — 1 and S(0i) = S(8i+i = 0. 

term Va(0i) © corresponds, up to the second order, to a harmonic interaction term ~ (0j — 9q) 2 /2, where 

A = -k ^j^-, B = - k l Q V =Acos(e ) + Bco S (26o) , (5) 

sin (0 O ) 4 sin (00 J 

with = 20eh, 9o — 57r/12 rad or 75° and where eh sets the energy scale. This formulation in terms of cosines 
allows to speed up the simulation, since it is sufficient to evaluate cos(0j) and the value of bond-angle is not needed, 
and at the same time to avoid spurious divergences in the force expression due to the vanishing of sin(0j) when three 
consecutive atoms become aligned (43|. 

The dihedral angle potential is characterized by three minima for <p — (associated to a so-called trans state) 
and ip — ±27r/3 (corresponding to gauche states), this potential is mainly responsible for the formation of secondary 
structures. In particular large values of the parameters Ci,Di favor the formation of trans state and therefore of 
/3-sheets, while when gauche states prevail a-helices are formed. The parameters (Cj, Di) have been chosen as in [39l |. 
i.e. if two or more beads among the four defining ip are neutral (N) then Ci — and Di — 0.2sh, in all the other cases 
Ci = Di = 1.2eh (see Fig. Q]). The tapering function S(9i) = 1 — cos 32 (0i) has been introduced in the expression of V3 
in order to cure a well known problem in the dihedral potentials (43[. This problem is encountered whenever 9i = 
or 7T, i.e. when three consecutive beads are in the same line, in these situations the associated dihedral angle is no 
more defined and a discontinuity in V3 arises. In contrast to what reported in [43| this situation is not improbable for 
the present model. The quantity S(6i)S(6i+i) entering in the definition of V3 has a limited influence on the dynamics 
apart in proximity of the above mentioned extreme cases. Moreover, S(9i)S(6i+i) is C°°, its value is essentially one 
almost for any 9i, it does not introduce any extra minima in the potential and it vanishes smoothly for 9i — * or 

9, ^TT 0. 

The last term V4 has been introduced to mimic effectively the interactions with the solvent, it is a Lennard- Jones 
potential and it depends on the type of interacting residues as follows: 

• if any of the two monomers is neutral the potential is repulsive cn,x = and its scale of energy is fixed by 
£n,x — 46^; 

• for interactions between hydrophobic residues cb,b = 1 and Eb.b — 4e^; 

• for any polar-polar or polar-hydrophobic interaction cp.p = cp.p = —1 and ep_p = Ep.B = (8/3)e;j. 
Accordingly, the Hamiltonian of the system reads 

L D 2 ,2 ,2 L-1 

H = K + V = J2 X ' V 2 * +E y i ( f W ) + 

i=i »=i 

L-1 L-2 L-3 L 





GF 


BF 


Vnc 


-49.878 


-23.956 


Vi 


0.787 


0.777 


v 2 


1.767 


5.744 




2.602 


23.105 




-55.035 


-53.582 



TABLE I: Potential energy values associated to the NC of the GF and BF, the different contributions to the total potential 
energy Vnc are also reported. 

where, for the sake of simplicity, all monomers are assumed to have the same unitary mass, the momenta are defined 
as (p x ,i,P y ,i,Pz,i) = {xi,Vi,Zi) and we fix e h = 1. 

In the present paper we consider the two following sequences of 46 monomers, 

• [GFI^Bfl jy^fPi^UjV^ijfliysfPi?)^^ a sequence that has been widely analyzed in the past for spontaneous folding 
@) EE US El, H3 S, [H, H(| as well as for mechanical unfolding and refolding [il|, [42j ; 

• [BF]= BNBPB 3 NPB 4 NBPB2NP2B 5 N 2 BPBNPB 2 NBP 2 BNB 2 PB 2 a randomly generated sequence, but 
with the same number of B, P and N monomers as the GF. 

These two sequences have been chosen because GF has been previously identified as a reasonably fast folder [3(| 
(see also [38[ for a detailed and critical analysis of the basin-bottom structures observed for this model), while we 
expect that the sequence BF, being randomly chosen, cannot have the characteristic of a good folder. From now on 
we refer to the sequence GF (resp. BF) as the good (resp. bad) folder. 

The 46-mer sequence GF exhibits a four stranded /3-barrel Native Configuration (NC) with an associated potential 
energy E^c = —49.878. Please note that the model is here analyzed by employing the same potential and parameter 
set reported in Ref. f3Qj ] . but neglecting any diversity among the hydrophobic residues. The NC, displayed in Fig. 
IHlJa), is stabilized by the attractive hydrophobic interactions among the B residues, in particular the first and third 
Bg strands, forming the core of the NC, are parallel to each other and anti-parallel to the second and fourth strand, 
namely, (PB)^ and (PB)^P. The latter strands are exposed towards the exterior due to the presence of polar residues. 

As shown in Fig. [T0l the native structure of the BF is quite different, it has a core constituted by the first three 
/3-strands and a very long "tail" (made of 18 residues) wrapped around the core. In particular, the first and second 
/3-strand (namely, BNBPB3NP and B4NBPB2) are formed by 9 residues, and anti-parallel to each other. For more 
clarity, we will term tti the plane containing the first 2 strands. The third strand (namely, P2B5) is made of 7 residues 
and it is located in a plane lying in-between the first and second strand, which is almost perpendicular to tti. The 
chain rotates of almost 90 degrees in correspondence of the two consecutive neutral beads and then exhibits a short 
strand of 3 beads PBP before turning back with a parallel strand of 7 beads (PB2NBP2) that passes below m. 
Finally the chain turns once more back by passing this time above the plane In the final part of the tail of the 
chain a short strand of 5 residues, parallel to the 4-th and 5-th strands, can be identified as B 2 PB 2 . The potential 
energy of the NC of the BF is quite high with respect to the GF, namely Vnc = —23.956. Moreover, this difference, 
as reported in Table U is essentially due to the difference in the dihedral contributions, that is much higher in the NC 
of the BF with respect to the GF, while all the other contributions, in particular the LJ ones, have nearby values. 
The dihedral contribution that arises in the BF is essentially due to the configuration of the first 3 strands, since 
these are arranged over two almost orthogonal planes. 

B. Simulation protocol: equilibrium Langevin dynamics 

Molecular dynamics (MD) canonical simulations at equilibrium temperature T have been performed by integrating 
the corresponding Langevin equation for each monomer of unitary mass (characterized by the position vector r^)): 

fi = P( ri ) - TTi + v (t) i = l,L (7) 

where r/(t) is a zero average Gaussian noise term with correlations given by (i]a(t)rif3(t')) = 2T^8(t — t')8 at f3; F = —VI/, 
being V the intramolecular potential introduced in III A[ 7 the friction coefficient associated to the solvent and by 
assuming an unitary Boltzmann constant. 



Numerical integrations have been implemented via a standard Euler scheme with a time-step At — 0.005 and 
with a low friction coefficient 7 = 0.05 [39]. Two different kinds of MD have been performed, namely unfolding 
simulations (US) and folding simulations (FS). In the first case the initial state of the system is taken equal to the 
native configuration (NC), that we assume to coincide with the minimal energy configuration. In the latter one the 
initial state is a completely unfolded configuration. 

C. Simulation protocol: out-of-equilibrium mechanical unfolding 

In order to mimic the mechanical pulling of the protein attached to a AFM cantilever, or analogously when trapped 
in an optical tweezer, one extremum of the chain was kept fixed and the last bead is attached to a pulling apparatus 
with a spring of elastic constant k. The external force is applied by moving the "cantilever" along a fixed direction 
with a certain protocol z(t). Before pulling the protein, the coordinate system is always rigidly rotated, in order to 
have the z-axis aligned along the end-to-end direction connecting the first and last bead. Therefore by denoting with 
£ the end-to-end distance the component of the external force along this direction reads as 

F ext = k(z - C) (8) 

where k = 10 in order to suppress fast oscillations. As recently pointed out it is extremely important to use a sample 
of thermally equilibrated initial configurations to correctly reproduce the equilibrium FEL via the JE. [HJ . Therefore, 
before pulling the protein, we have performed a thermalization procedure in two steps. At a fixed temperature T, 
initially the protein evolves freely starting from the NC for a time t = 1,000, then it is attached to the external 
apparatus, with the first bead blocked, and it equilibrates for a further for a further time period t = 500. The system 
(at sufficiently low temperatures) quickly settles down to a "native- like" configuration. This configuration is then 
employed as the starting state for the forced folding. The protocol that we have used is a linear pulling protocol 
with a constant speed v p , i.e. z(t) = z(0) + v p x t, by assuming that the pulling starts at t = 0. Usually we have 
employed velocities vp G [5 x 10~ 6 : 5 x 10~ 2 ] and set z(0) = (0, i.e. to the end-to-end distance associated to the 
native configuration. 

III. WEIGHTED HISTOGRAM ANALYSIS METHOD 

A combination of the umbrella sampling technique [35| with the weighted histogram analysis method (WHAM) [36] 
allows to obtain the equilibrium free energy profile as a function of the end-to-end distance. 

The umbrella sampling technique (35| amounts to perform of a series of biased molecular dynamics simulations of 
the system constrained by an external potential, namely 

^(0 = ~MC-&) 2 • (9) 

The potential Wi forces the heteropolymer to stay in configurations characterized by a certain average end-to-end 
distance £j, even if at the considered temperature such Q- value is highly unfavored. These simulations allows to obtain 
a series of M biased end-to-end probability density distributions pf {Q{i = 1, • ■ ■ , M }, which properly combined can 
permit the reconstruction of the equilibrium unbiased p(Q. In particular, in the case of identical statistics for each 
biased run the WHAM formalism prescribes the following combination 

(c) = Efiipf(C) = - 0fw (c,T) (10) 

L^ii— 1 

where (3 — 1/T and the free energy constants {i^} can be obtained by the normalization condition 

e-? F > = J d( e-^K)p(C) . (11) 

Eqs. Unj) and (fTTj) should be solved self-consistently via an iterative procedure, finally this allows to obtain an estimate 
of the equilibrium free energy ^(CiT), apart from an additive constant. 



We have considered equally spaced {Ci}-values, with a separation AQ = 0.2 among them, ranging from the native 
configuration £o to the all irons-configuration Ctrans 1 • F° r each of the M runs, after a quite long equilibration time 
t ~ 120, 000 — 200, 000, we have estimated pf(() over 100,000 configurations taken at regular time intervals At = 0.2. 
The biased simulations have been performed with a hard and weak spring, corresponding to kw — 10 and 0.5 in 
©, respectively. The results obtained essentially agree for the two kw- values, apart when the free energy landscape 
exhibits steep increases as a function of £. In these cases the hard spring is more appropriate, since the weak one 
allows the protein to refold, thus rendering the ^-intervals, where fw{C) is steeper, not accessible to the WHAM 
reconstruction. 



IV. INHERENT STRUCTURE FORMALISM 



Inherent structures correspond to local minima of the potential energy, in particular the phase space visited by 
the protein during its dynamical evolution can be decomposed into disjoint attraction basin, each corresponding to 
a specific IS. Therefore, the canonical partition function can be expressed within the IS formalism as a sum over the 
non overlapping basins of attraction, each corresponding to a specific minimum (IS) o 0,113: 

Zls (T) = _L £ e -PV. I e -/3AV.(T )dr = £ e -p [Va+ R a {T)] (12) 

where N' is the number of degrees of freedom of the system, A is the thermal wavelength, T represents one of the 
possible conformations of the protein within the basin of attraction of a, V a is the potential energy associated to the 
minimum a, AV a (r) = V(r) — V a and R a (T) the vibrational free energy due to the fluctuations around the minimum. 
The vibrational term R a (T) can be estimated by assuming a harmonic basin of attraction: 



6 X 3N 



. 3iV-6 _ 

PRa(T) = _1_^ e-^dT = J] ^ (13) 



where ui 3 a are the frequencies of the vibrational modes around the IS a and an unitary reduced Planck constant has 
been considered. 

Therefore the probability to be in the basin of attraction of the IS a is 

Pa(T) = ^J^ e -«*+*-P0) . (14) 

The free energy of the whole system at equilibrium is simply given by fis(T) = —Thx[Zis(T)]. However if one is 
interested to construct a free energy landscape as a function of a parameter characterizing the different IS, like e.g. 
the Kabsch distance 5k [H| or the end-to-end distance £, this is possible by defining a partition function restricted 
to IS with an end-to-end distance within the narrow interval [£; C + dQ 

z IS {c,T) = Y!^ {Va+Ra{T)] ( 15 ) 

a 

where the Y^, indicates that the sum is not over the whole ensemble of ISs {a} but restricted. The free energy profile 
as a function of £ can be simply obtained by the relationship: 

f IS ((,T) = -Tln[Z IS ((,T)} ; (16) 

while the average potential and free vibrational energy, corresponding to ISs characterized by a certain £, can be 
estimated as follows: 

V V c - f3 ^ +R ^ T ^ V 'R (T) e -P[v«+R«(T)] 



This is an elongated (planar) equilibrium conformation of the protein with all the dihedral angles at their trans values, corresponding 

to Ctrans = 35.70. 



T 


goodfolder 


badfolder 


0.1 


2,843 


456 


0.2 


5,875 


1,763 


0.3 


12,359 


6,477 


0.4 


35,409 


21,060 


0.5 


52,546 


45,950 


0.6 


51,971 


— 


0.7 


54,736 


— 



TABLE II: Number of distinct ISs contained in the PBD at different temperatures. These have been obtained by sampling, 
during out-of-equilibrium mechanical unfoldings, several Langevin trajectories at constant elongation increments 8C, — 0.1. The 
total number of relaxations performed for each temperature amounts to ~ 60, 000 corresponding to ~ 200 repetitions of the 
same pulling experiment. The considered experiments have been performed at vp = 5 x 10 -4 for the GF, while velocities in the 
range vp £ [5 x 10~ 5 : 5 x 10" 4 ] have been employed for the BF. For the bad folder not all temperatures have been examined. 

In order to find the different ISs one can perform MC samplings or molecular dynamics (MD) simulations. We 
have chosen to examine MD trajectories at constant temperature via a Langevin integration scheme. In particular, 
we have built up two data banks of ISs: the thermal data bank (TDB) obtained by performing equilibrium canonical 
simulations and the pulling data bank (PDB) by mechanically unfolding the protein. In order to find the different 
ISs the equilibrium (resp. out-of-equilibrium) Langevin trajectory is sampled at constant time intervals St — 5 (resp. 
at constant elongation increments 5£ = 0.1) to pinpoint a series of configurations, which afterward are relaxed via 
a steepest descent dynamics and finally refined by means of a standard Newton's method. In the case of the TDB, 
in order to speed up the search of ISs we have employed a so-called " quasi-Newton" method |46| 2 . For mechanical 
unfolding, the protein is unblocked and the pulling apparatus removed before the relaxation stage. Two local minima 
are identified as distinct whenever their energies differ more than 1 x 10~ 5 . The TDB for the good (resp. bad) folder 
contains 579, 749 (resp. 210, 782) distinct ISs collected via equilibrium simulations at various temperatures in the 
range [0.3; 2.0]. The PDB contains 3,000 — 50,000 ISs depending on the examined temperature as detailed in the 
Table M 

V. EXTENDED JARZYNSKI EQUALITY 

In the present section, we discuss an extended version of the Jarzynski equality, which allows one to obtain the free 
energy profile as a function of a collective coordinate. Let x be the variable that identifies the system microscopic 
state, e.g. the collection of the positions and momenta of all the particles in the system x = {ri,pi}. The system 
Hamiltonian is a function of x, and will be indicated as Ho(x) in the following. Let X(x) be a macroscopic observable 
of the system, e.g. the volume, and let us assume that the system is subject to an external potential U\(X), which is 
function of X, and which depends on a parameter A whose value is externally controlled. The parameter A changes 
according to a given time protocol A(t), and thus the system is characterized by a time dependent Hamiltonian 
H(x,t) — Hq(x) + Ux(t)(X[x)). The thermodynamic work done on the system, as the external parameter A changes, 
reads 

W t = fdt' A(f) d x U x (X(x(t'))\ x=Htl) . (18) 
Jo 

Due to thermal fluctuations, Wt varies between a realization and another one of the manipulation process. 

We now introduce the function f(X,T), which is the free energy of the constrained ensemble, in which the value 



2 The comparison between the steepest descent and the quasi-Newton methods has revealed that this second minimization scheme is 
somehow faster (1.8 times faster at T = 0.5 for the good folder), but while the steepest descent algorithm is able to identify the 
mctastable stationary states in the 99.8 % of examined cases the quasi-Newton scheme was successful in the 98.7 % of situations. 
However the distributions of the identified minima (by considering the same trajectory) obtained with the two schemes are essentially 
coincident. 



X(x) is fixed at X: 



f(X,T) = -k B Tki I dxS(X - X{x))e- (ma{x) . (19) 

The extended Jarzynski equality, relates the work done on the system, as an effect of the change in the external 
parameter A, with the free energy f(X, T) [H Hzl 01 : 



Zoe^wW (S(X - X(x))e~^ w *) 



(20) 



where Zq = J dx exp [— (3Hq(x)] is the partition function associated with the time-independent Hamiltonian Hq(x) 
and the averages (-) t are taken over many realizations of the same protocol at time t. Equation (|20p provides thus 
a method to evaluate the unperturbed free energy f(X,T) as long as one has a reliable estimate of the lhs of this 
equation. It is worth to note that one does not need to evaluate the partition function Zq to evaluate f(X,T), as it 
appears only as a multiplicative constant in eq. (|20|l . 

The optimal estimate of f (X,T) can be obtained by combining Eq. (|20|) with the previously discussed method of 
weighted histograms [H, E3, 48] , namely 



fj(X,T) = -k B T\n 



E, 



{S(X-X(x))cxp(~f3W t )) t 
{cxp(- /3W t )) t 
cxp(-£/(X.t)) 
2^t <exp(-/3W t )) t 



(21) 



where the sums are over successive time snapshots. For a detailed derivation of Eq. f2"U|) see [I] 



VI. THERMODYNAMIC AL PROPERTIES 



The main thermodynamical features of the examined model can be summarized by reporting three different tran- 
sition temperatures [l], [ll|, EH EH H3] : namely, the hydrophobic collapse temperature Tg, the folding temperature Tf, 
and the glassy temperature T g . 

The collapse temperature discriminates between phases dominated by random-coil configurations rather than col- 
lapsed ones (5l| . Tg has been usually identified as the temperature where the heat capacity C(T) reaches its maximal 
value, namely (within the canonical formalism): 

C(T e ) = C max , where C(T) = ^ ) ~} E) , (22) 

and < ■ > represents a time average performed over an interval t ~ 10 5 by following an US trajectory. From Fig. 
it is evident that for both sequences C(T) ~ 138 up to temperatures T ~ 0.25. This result can be understood 
by noticing that at low temperatures the thermal features of heteropolymers resemble that of a disordered 3D solid, 
with an associated heat capacity C so i = 3L. Moreover, the high temperature values are smaller than C so ;, since in 
this limit we expect that a one dimensional chain in a three dimensional space would have a specific heat C = 2L 
[49l ]. However, as shown in Fig. these extreme temperatures have not yet been reached. The comparison of the 
heat capacity curves for the GF and BF reveals that C(T) obtained for the GF has a much broader peak with respect 
to the BF. This indicates that the transition from the NC to the random coil state is definitely sharper for the bad 
folder. 

The folding temperature has been defined in many different ways [U H^, H^] , however we have chosen to define 
the folding temperature by employing the IS reconstruction of the phase space. In practice, quite long USs have been 
performed at various temperatures , up to duration t = 5, 000, 000. During each of this US the visited ISs have been 
identified at regular intervals St = 5, and from these data we have estimated the probability P nc (T) to visit the NC 
at such temperature. The folding temperature Tf is then defined as 

P nc (T f ) = 0.5 . (23) 

Indeed, it should be noticed that for the GF P nc is the probability to stay in the two lowest lying energy minima 
(ISs) and not in the NC only. These two minima can be associated to an unique attraction basin, since their energy 
separation is extremely small with respect to |Vatc| (namely, 0.04) and also the corresponding configurations are 
almost identical, being separated by a Kabsch distance 5k = 0.128. Moreover, at any examined temperature we have 
always observed a rapid switching between the two configurations, indicating that there is an extremely low energy 
barrier among these two states. 
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FIG. 2: (Color online) Heat capacity C as a function of the temperature T for good (a) and bad (b) folder; the vertical (red) 
dotted line indicates the hydrophobic collapse temperature Tg and the horizontal (black) dashed line the value C so i. 
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FIG. 3: (Color online) Probability P nc as a function of the temperature T for good (a) and bad (b) folder; the vertical (magenta) 
dashed line indicates the folding temperature Tf, while the horizontal (black) dotted line refers to the value 0.5. . 



The glassy temperature T g indicates the temperature below which freezing of large conformational rearrangements 
occurs: below such a temperature the system can be trapped in local minima of the potential. By following [49j . 
in order to locate T g we have made a comparison among results obtained from FS and US. In particular, we have 
examined, at the same temperatures, the average total energy (E) of the system evaluated over finite time intervals. 
As shown, in Fig. 2J these quantities, when obtained from USs and FSs, coincide at temperatures larger than T g , 
below which the structural arrest takes place. In particular, unfolding averages have been performed over intervals 
of duration t = 10 5 by following a single trajectory. On the other hand, folding simulations have been followed up to 
times t ~ 1.1 • 10 7 and the averages taken over 5-7 different initial conditions by considering for each trajectory only 
the last time span of duration t ~ 5 • 10 4 . The error bars (standard deviation) shown in Fig. should be interpreted 
, at sufficiently low temperatures, as a sign of the dependence of the results on the initial conditions. 
The three transition temperatures estimated for the good and bad folder are reported in table [TTT1 3 . One can notice 
that Tg is larger for the good folder, thus indicating that the collapsed state has a greater stability with respect to the 



3 In [3l| for the sequence GF it has been found Tg = 0.65 and Tf ~ 0.34; moreover in the same paper the authors suggested that 
the folding transition was associated to a shoulder in the C, but this result has been recently criticized |40l. Moreover, more recent 
estimates, obtained by employing different protocols, suggest that Tf ~ 0.24 — 0.25 9, ll] and T 9 ~ 0.15 values that are essentially 
in agreement with our results 
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FIG. 4: (Color online) Total energy (E) as a function of the temperature T for good (a) and bad (b) folder; the solid (red) 
line corresponds to US's and the (blue) symbols to FS's. In the inset an enlargement for low temperatures: the dashed lines 
indicate the glassy (T g ) (magenta) and folding [Tf) (green) temperatures. 





GF 


BF 


T e 




0.65(1) 


0.46(2) 


Tf 




0.255(5) 


0.24(1) 


T 9 




0.12(2) 


0.27(2) 



TABLE III: Transition temperatures estimated for good and bad folder with the corresponding error. 



bad folder. Moreover, while for the good folder Tf > T g , for the bad one this order is reversed. Therefore the BF will 
most likely remain trapped in some misfolded configurations before reaching the NC even at temperatures T <~ Tf. 



VII. EXTENDED JARZYNSKI EQUALITY RECONSTRUCTION 

In this section we present for both the sequences GF and BF the reconstruction of the FEL, at various temperatures, 
as a function of the end-to-end distance £ starting from out-of-equilibrium measurements. The free energy profiles 
have been obtained via the EJE by averaging over 28 — 250 repetitions of the same pulling protocol depending on the 
pulling velocity as described in section III Bl We have generally used the pulling configuration where the first bead 
is kept fixed and the 46th bead is pulled (tail-pulled case). However, by considering the head-pulled case, where the 
roles of the first and last bead are reversed, we obtain, for sufficiently low velocities (namely, v p < 5 x 10 for the 
GF and v p < 5 x 10 -5 for the BF), exactly the same free energy profile. These results are essentially in agreement 
with those reported in [42| for the GF. 



A. Good folder 



In Fig. 0(a) are presented the EJE reconstructions /j(C) (symbols) for T=0.3 obtained at various pulling velocities 
for the good folder together with the corresponding WHAM estimate fw(() (dashed lines). As a first point, we 
notice that the estimated FEL collapses towards fw(() as the pulling velocity decreases. In particular, for the good 
folder the asymptotic shape is reached for small £- values at a somehow larger velocity (namely, for £ < 10 already for 
v p = 5 x 10 -4 ) than at larger £. In particular, to reproduce fw(C) U P to (trans the pulling should be performed at 
v p = 5 x 10~ 6 . Moreover, referring to Fig. it is possible to identify the structural transitions (STs) induced by the 
pulling experiment. As shown in Fig. 0(b), the asymptotic fj(() profile exhibits a clear minimum in correspondence 
of the end-to-end distance of the NC (namely, Co ~ 1-9). In more detail, up to £ ~ 5.6, the protein remains in 
native-like configurations characterized by a /3-barrel made up of 4 strands, while the escape from the native valley 
is signaled by the small dip at £ ~ 5.6 and it is indicated as ST1 in Fig. (b). This ST has been firstly identified in 
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FIG. 5: (Color online) (a) Free energy profiles fj for the good folder as a function of the end-to-end distance £ at T — 0.3, 
obtained with the EJE for various pulling velocities: from top to bottom v p — 5 x 10~ 2 , 1 x 10~ 2 , 5 x 10 -3 , 5 x 10 -4 , 2 x 10~ 4 , 
2 x 10 -5 and 5 x 10 -6 . In (b) an enlargement of the curve for !i p = 5x 10 -6 at low £ is reported. The (black) dashed curve in 
(a) and (b) refers to the WHAM reconstruction fw(0 with kw = 10. The number of different pulling experiments performed 
to estimate the profiles ranges between 150 and 250 at the higher velocities to 28 at the lowest velocity v p = 5 x 10~ 6 . The 
letters indicate the value of /(£) for the pulled configurations reported in Fig. [6] (a) and the (blue) vertical solid lines the 
location of the STs. 



[4l| by analyzing the the potential energy of ISs measured during a mechanical unfolding (numerical) experiment. In 
particular, Lacks [4l[ identifies this transition as an irreversible transition, in the sense that above this transition it 
is no more sufficient to reverse the stretching to recover the previously visited configurations 4 . 

For £ > 6 the configurations are characterized by an almost intact core (made of 3 strands) plus a stretched tail 
corresponding to the pulled fourth strand (see (b) in Fig. Eta)). The second ST amounts to pull the strand (PB) 5 P 
out of the barrel. In the range 13 < £ < 18.5 the curve fj(() appears as essentially flat, thus indicating that almost 
no work is needed to completely stretch the tail once detached from the barrel (see configuration (c) in Fig. [HJa)). 
The pulling of the third strand (that is part of the core of the NC) leads to a definitive destabilization of the /3-barrel. 
This transition is denoted as ST3 in Fig. 0(b). The second plateau in /j(£) corresponds to protein structures made 
up of a single strand (similar to (d) in Fig. EJa)). 

To distinguish between entropic and energetic costs associated to each ST we have also evaluated separately the 
potential energy contributions Vi (i = 1, .. . ,4) during the pulling experiment, these data are reported in Fig. E^b). 
From the figure it is clear that the variation of the potential energy during the stretching is essentially due to the 
Lennard- Jones term V4, while the other terms contribute to a much smaller extent, at least up to £ ~ 35. The 
transition ST1 has essentially only energetic costs, since A/ = 7(1) and the potential energy varies almost of the 
same amount, in particular AV ~ AV4 = 8(1). The other transitions instead have not negligible entropic costs, since 
the free energy barrier heights associate to ST2 and ST3 are 10(1) and 29(2), respectively; while the corresponding 
potential energy barriers are higher, namely AV = 16(1) for ST2 and AV = 43(1) for ST3. The complete stretching 
of the protein up to £ = 35 has a free (resp. potential) energy cost corresponding to A/ = 30(2) (resp. AV — 49(1)). 
Above £ ~ 35, while the Lennard- Jones and dihedral contributions vanish, the final (almost quadratic) rise of the 
free energy is due to the harmonic and angular contributions, since we are now stretching bond distances and angles 
beyond their equilibrium values. Due to computational constraints and to the fact that this part of the FEL is not 
particularly relevant, the reconstructions at the lowest velocities and the WHAM estimations have been not performed 
for these large £- values. 

In Fig. [7] the reconstruction of the FEL obtained at various temperatures is shown. For temperatures around Tf 
one still observes a FEL resembling the one found for T = 0.3, while by increasing the temperature the dip around 
£ ~ 6 — 7 (associated to ST1) disappears and the heights of the other two barriers reduce. By approaching Tg the 
first plateau, characterizing the transition from the NC to configurations of type (c), essentially disappears, and it is 



4 Please notice that we observe this transition at £ ~ 5.6 and not at £ = 4.782 as Lacks has reported, since we are considering the free 
energy profile at T = 0.3, while Lacks' analysis concerns potential energies of the ISs. Our inspection of the average potential energies 
estimated during the pulling experiments and reported in Fig. I14f a) confirms this small mismatch. 
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FIG. 6: (Color online) (a) Pulled configurations of the good folder at T — 0.3: the NC (a) has £o ~ 1.9; the others are 
characterized by £ = 6.8 (b), £ = 16.8 (c), and £ = 27.1 (d). The beads of type N, B, and P are colored in green, red and 
yellow, respectively, (b) Potential energies contributions as a function of the end-to-end distance £ estimated during a pulling 
experiment with speed v p = 5 x 10~ 6 and obtained by averaging over 28 different realizations at T = 0.3. (Black) Stars indicate 
the entire potential energy V, (orange) crosses Vi, (blue) triangles V2, (magenta) diamonds V3, and (red) squares V4. The 
(blue) vertical solid lines indicate the transitions previously discussed in the text. 
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FIG. 7: (Color online) Free energy profiles /j(C) obtained with the EJE for good folder at various temperatures: T = 0.2 (red 
squares), 0.4 (green stars), 0.5 (orange circles), 0.6 (magenta plus) and 0.7 (blue triangles). In the inset, an enlargement is 
reported at small £. Data refer to v p — 5 x 10 -4 . The number of different realizations performed to estimate the averages at 
the different temperatures ranges between 160 and 250. 



substituted by a monotonous increase of /j(C). This suggests that 4 stranded /3-barrel configurations coexist with 
partially unfolded ones. Above Tg only one barrier remains indicating that at these temperatures the protein unfolds 
completely in one step process. 

The connection between dynamical properties of the system and the free energy profile is still an open problem. In 
particular, the relationship between the unfolding times and the free energy barriers has been previously discussed in 
Rcf. 52| for proteins and more recently the same problem has been addresses for Ising-like lattice protein model in 
Ref. [53|. We have estimated average first passage times r via USs by recording the time needed to the protein to 
reach a certain end-to-end threshold C,th once it starts from the NC at different temperatures. Our data, reported in 
Fig. [H clearly indicate that at low temperatures the simple result of the transition state theory [55|, [56| , namely 

e A//T 

t = -jr- > (24) 



where A/ = f(Qh) ~ /(Co)j is in very good agreement with the numerics. However, at high temperatures the 
agreement worsens. Therefore, in order to take in account all the details of the free energy profile and not only the 




FIG. 8: (Color online) Average unfolding times r for the GF at various temperatures corresponding to (th = 4. Filled (black) 
circles denote the numerical data, the estimations obtained via eq. (|24jl and eq. (|25p are represented by empty (blue) diamonds 
and (red) stars, respectively. The arbitrary scaling factor entering in eq. (125[l (see text) has be set equal to 8. The average 
times have been estimated over 100, 000 - 200, 000 unfolding events for T — 0.7 and 0.6, 12, 000 events at T = 0.5 and as few 
as 200 and 60 events at the lowest temperatures, namely T = 0.4 and 0.3. 



barrier height, we have generalized a result of the Smoluchowski theory for the escape of a particle from a potential 
well [El] as follows H 




where the potential energy has been substituted by the free energy profile. The estimation obtained via eq. (|25|) 
compare well with the numerical results at all the considered temperatures, unfortunately apart an arbitrary scaling 
factor common to all the temperatures that we are unable to estimate (see Fig. [HJ. 



B. Bad Folder 



In Fig. 02a) are reported the free energy profiles fj(£) reconstructed via the EJE at T = 0.3 for different pulling 
speeds (symbols) together with the estimated fw(() (dashed line), as in the case of the GF one observes a collapse to 
the equilibrium FEL (represented by fw(0) f° r a sufficiently small speed. In particular, at v p = 5 x 10~ 6 a reasonably 
good agreement between fj and fw is already achieved. 

For the BF the mechanically induced unfolding transition are less clearly identifiable from the inspection of the free 
energy profile for two reasons. Firstly, for the BF not only the LJ interactions play a role in the STs but also the 
dihedral terms: these two terms contribute with opposite signs to the whole potential energy thus partially canceling 
each other. Moreover, as we will show in the following the main contribution to the free energy is due to entropic 
terms. Therefore, in order to identify the STs it is better to consider the distinct average profile of the single potential 
contributions Vi (i = 1, . . . ,4) reported in Fig. [9jb). In particular, the most relevant is the Lennard- Jones term V4, 
due to the stabilizing effect of the hydrophobic interactions on the protein structure. From the inspection of V4, at 
least four different STs can be single out, occurring at ( ~ 7.3, 14.5, 19.3, and 26.3, respectively. 

The first transition amounts to pull the last part of the tail out of the NC, namely the 6th and 5th strand that we 
have previously identified. To this ST is associated a free energy increase of 3.1(5) and a potential energy variation of 
8.0(5), once the ST1 is completed the protein assumes the configuration (b) shown in Fig. [TUJ ST2 consists in pulling 
out from the compact configuration the whole tail (therefore to detach also the 4th strand) and leaving the protein in 
a configuration composed by the core (represented by the first three strands) plus a long tail (see configuration (c) in 
Fig. [T0|) . The entropic contributions to ST2 is quite relevant since to pass from the NC to (c) the free energy increases 
of 3.8(5), while the associated potential energy variation is almost the triple, i.e. 11.5(5). The third transition amounts 
to detach the first /3-strand (BNBPB3NP) from the core and this operation has much greater costs with respect to 
the previous STs, namely, A/ = 7.0(5) and AV = 15(1). The complete opening of the core structure (now made 
only of the second and third strand) occurs at C ~ 27 amounting to a total free (resp. potential) energy barrier 
to overcome of height 11(1) (resp. 23(1)). At variance with the GF case, for the BF the entropic costs are never 




FIG. 9: (Color online) (a) Free energy profiles fj for the bad folder as a function of the end-to-end distance £, obtained with 
the EJE for various pulling velocities: from top to bottom v p = 5 x 10 -4 and 160 realizations (black circles), 2 x 10 -4 and 
200 realizations (red squares), 1 x 10 -4 and 200 realizations (blue triangles), 5 x 10 -5 and 100 realizations (green diamonds), 
5 x 10 -6 and 28 realizations (magenta stars). The WHAM estimate fw(0 is a l so shown (black dashed line). In the inset an 
enlargement of the curve at low £ for » p = 5x 10~ 6 is reported together with fw(C)- Data have been obtained at T = 0.3. (b) 
Potential energies contributions as a function of the end-to-end distance £ estimated during a pulling experiment with velocity 
v p — 5 x 10 -6 and obtained by averaging over 28 different realizations at T = 0.3. Black stars indicate the entire potential 
energy V, (orange) crosses Vi, (blue) triangles V2, (magenta) diamonds V3, and (red) squares V4. The (blue) solid lines indicate 
the transitions discussed in the text. 



negligible and instead they always amount at least at the half of the potential energy contributions in all the four 
examined transitions. Finally, analogously to the GF for £ > 35 the LJ and dihedral contributions essentially vanish 
and the free energy increase is due to the harmonic and angular terms, only. 

In Fig. [11] the reconstruction fj of the FEL for the bad folder is reported at three temperatures below Tg. As one 
can notice the bad folder exhibits at comparable temperatures much lower free energy barriers, indicating that the 
NC and the partially folded structures are less stable, with respect to the GF. This is reflected also in the value of Tg 
that has a smaller value with respect to the GF: namely, 0.46 for BF and 0.65 for GF. By increasing T the heights 
of the free energy barriers rapidly decrease and the various STs become less clearly defined. Moreover, the FEL of 
the BF at the lower examined temperature (T = 0.2) reveals, besides the absolute minimum (corresponding to the 
NC), other two local minima at £ ~ 7 and £ ~ 11. This indicates that, at variance with the GF, the BF can remain 
trapped even at T ~ Tf, for some finite time, in intermediate (misfolded) states far from the NC. 



VIII. INHERENT STRUCTURE LANDSCAPE 



In this section we compare the reconstructions of the FEL for the good and bad folder obtained via the EJE and 
the IS approach with the WHAM equilibrium estimation. As already explained in Sect. Ill, we have created two 
IS data banks : the thermal data bank (TDB) obtained by performing equilibrium canonical simulations and the 
pulling data bank (PDB) by mechanically unfolding the protein. In Fig. [12] is reported for the GF the comparison, 
at three temperatures, between the estimate fw(C) with fis(C) an d the /j(£), obtained via the EJE reconstruction. 
The results reveal an astonishingly good coincidence between fw(Q an d fis((), obtained by employing the PDB, 
at all the examined temperatures. For what concerns the EJE reconstructions: at T — 0.3 /,/(£) is essentially in 
good agreement with the other two estimations, while at higher temperatures the fj curves slightly overestimate the 
equilibrium free energy fw for £ > 10. This discrepancy is probably due to a non complete convergence of the EJE 
approach at the considered pulling velocities, smaller velocities are required to recover the equilibrium profile at all 
th end-to-end distances. 

The further comparison reported in Fig. [12] between the IS reconstructions obtained via the TDB and the PDB 
indicates a perfect coincidence up to £ ~ 17. On the contrary, during the last stage of the unfolding process the 
two fis differ: the TDB FEL is steeper than the PDB one. This suggests that during the mechanical unfolding the 
protein can easier reach states with low energies, even at large £. These states have a very low probability to be 
visited during thermal equilibrium dynamics. However, at T = 0.3 the value of the barrier to overcome and that of 
the final plateau are quite similar to those of the PDB FEL, while at higher temperatures the final energy plateaus 




FIG. 10: (Color online) Pulled configurations of the bad folder at T = 0.3: the reported configurations refer to C,o = 4.7 (NC) 
(a), C = 9.9 (b), 14.5 (c), 22.1 (d), 24.6 (e), and 29.7 (f). 

of the TDB FEL are slightly larger than the /n/-plateaus. The reason of these discrepancies is related to the fact 
that, despite the high number of IS forming the TDB, this data bank is far from containing all the relevant ISs, in 
particular those associated to high ^-values are lacking. It should be remarked that the IS conformation with the 
maximal end-to-end distance is the all irans-configuration, corresponding to (trans = 35.70, therefore the IS approach 
does not allow to evaluate the FEL for £ > (trans- For the GF, we can safely affirm that the out-of-equilibrium process 
consisting in stretching the protein is more efficient to investigate the FEL, since a much smaller number of ISs are 
needed to reliably reconstruct it, as reported in Table ILT1 

The comparison for the BF case is reported in Fig. [T3]at T = 0.3 and 0.4. Also in this case the fw(C) and fis(() 
essentially coincide, apart at T = 0.3 and C, > 20 where fw is slighty higher than fjs- In this case the agreement 
between the two IS reconstructions is quite good at both the considered temperatures and for all ^-values. As far as 
the EJE reconstructions are concerned, at the employed pulling velocity (namely, vp = 5 x 10~ 6 ) fj can be considered 
as asymptotic at T = 0.3, while probably at T = 0.4 is still slightly overestimating fw, but please notice the really 
small range of the free energy scale reported in Fig. [T3Tb) with respect to the GF. 

Furthermore, from the IS analysis by employing Eq. (|17p we can obtain an estimate of the profiles of the potential 
and vibrational free energies Vis(() and Ris(C), respectively. From the latter quantity, the entropic costs associated 
to the various unfolding stages can be estimated. As shown in Fig. [LiT a). for the GF at T — 0.3, the structural 
transitions ST2 and ST3 previously described correspond to clear "entropic" barriers, while the ST1 transition has 
only energetic costs since Ai?/s ~ 0. This last result is in good agreement with the previously reported EJE analysis. 
For what concerns the other two transitions, ST2 (resp. ST3) is associated to a decrease ~ 6(1) (resp. 15(2)) of Rjs(C) 
once more in agreement with the EJE reconstruction. The complete opening of the protein is associated to a barrier 
ARjs(C) — 20(2), while the analysis reported in Sect lVIlK indicates an entropic barrier to overcome corresponding 
to ~ 19(2). These results suggest that for the good folder the entropic contributions to the free energy are essentially 
of the vibrational type. Moreover, the reconstructed potential energies Vrs(C) are in very good agreement with the 




FIG. 11: (Color online) Free energy profiles fj(() obtained via the EJE for bad folder at three temperatures: namely, T = 0.2 
(red squares), T = 0.3 (orange circles) , T = 0.4 (blue stars). In the inset an enlargement is reported at small (,. Data refer to 
pulling velocity u p = 5x 10 -6 and the averages are performed over 28 samples of the same protocol. 
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FIG. 12: (Color online) Free energy profiles fj (blue solid lines) as a function of £ for various temperatures for the good folder: 
a) T = 0.3 for v p = 5 x 10~ 6 and 28 repetitions; b) T = 0.4 for v p = 5 x 10~ 4 and 240 experiments; c) T = 0.5 for v p = 5 x 10 -4 
and 240 repetitions. The (black) dashed lines refer to the WHAM estimation fw{Q, (green) squares to fis(Q obtained by 
employing the TDB and (red) circles to fis(C) obtained by employing the ISs in the PDB for each considered T. 

average potential energy evaluated during the corresponding pulling experiments as shown in Fig. I14f a). 

Finally, one can try to put in correspondence the three unfolding stages previously discussed for the GF with 
thermodynamical aspects of the protein folding. In particular, by considering the energy profile Vis(C)> an energy 
barrier AVis and a typical transition temperature T t — (2AVis)/(3N) can be associated to each of the STs. The 
first transition ST1 corresponds to a barrier to overcome AVis = 8(1) and therefore to T t = 0.11(1), that, within 
error bars, coincide with T g . For the ST2 transition the barrier to overcome is AVis — 16(1) and this is associated 
to a temperature T t ~ 0.23(2) (slightly smaller than Tf). At the ST3 transition AVis = 43(2) corresponding to T t — 
0.62(2), while the energetic cost to completely stretch the protein is 50(2) with an associated transition temperature 
Tt = 0.72(2): the ^-temperature (T@ = 0.65(1)) is well bracketed within these two transition temperatures. At least 
for the GF , our results indicate that the observed STs induced by pulling can be put in direct relationship with the 
thermal transitions usually identified for the folding/ unfolding process. 

Also for the BF the IS approach is able to well reproduce not only the average potential energy during the pulling 
experiment, as clearly shown in Fig. [T4T b). but also to provide a good estimate of the "entropic" barriers associated to 
the structural transitions. In particular, at T = 0.3 the vibrational free energy barriers to overcome are ARis — 5.3(5) 




FIG. 13: (Color online) Free energy profiles fj as a function of C, for various temperatures for the bad folder: a) T = 0.3 and 
b) T = 0.4. The data refer to a pulling velocity vp — 5 x 10 -6 and 28 repetitions of the same pulling protocol. The symbols 
are the same as in Fig. [12] 




FIG. 14: (Color online) Reconstructed Vrs(C) (lower panel) and Ris{0 (upper panel) for good folder (a) and bad folder (b) by 
employing ISs in the PDB at T = 0.3. In the lower panel the blue dotted line refers to the average potential energy evaluated 
during the corresponding pulling experiments (this has been already reported in Fig. HJb) for the GF and in Fig. HJb)) for the 
BF. Please notice that the data have been vertically translated in order to have zero energy at the NC. 



at ST1, 8(1) at ST2, 10(1) at ST3 and 16(1) at ST4. These values are in reasonably good agreement with those 
previously obtained from the EJE reconstruction, apart at ST3 and ST4, where the analysis performed in Sect lVIIB 
indicates entropic barriers to overcome corresponding to ~ 8(1) and ~ 12(2), respectively. These underestimations at 
large £-values are probably due to the fact that at this temperature the estimated fj has not reached its asymptotic 
shape at the employed velocity. 

As already previously pointed out, the entropic contributions for the BF are more relevant than for the GF: e.g 
while the ST2 transition is clearly visible by the potential energy inspection it is almost absent by looking to the free 
energy profile (compare the data reported Fig. [^a) and (b)). Therefore we cannot expect to infer information on 
the thermal transitions from the knowledge of the potential energy barriers at the STs, as done for the GF. Indeed 
the estimated transition temperatures T t for the four examined structural transitions give values not corresponding 
to any of the relevant temperatures reported in Tabic [TTT1 for the BF. 

To better understand this difference we have performed USs for the GF and BF for T g < T < Tg and we have 
estimated the average, the minimal and the maximal £ associated to the visited ISs. The corresponding data are 
reported in Fig. [15] While for the GF the minimal value remains essentially £o for all the temperatures and the 
maximum £ increases smoothly up to ~ 18 at T — Tg, the dependence of the minimal and maximal £- values on T 
are more dramatic for the BF. Up to the temperatures T ~ 0.5 x Tg, average , minimal and maximal C-values almost 
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FIG. 15: (Color online) End-to-end distance of the ISs estimated during USs at various temperatures: (black) circles represent 
the average value; (blue) stars the minimal value; and (red) squares the maximal one. The upper panel refer to the BF and 
the lower one to the GF. The horizontal magenta dashed line indicates the (trans-value. For the GF (resp. BF) trajectories of 
duration t ~ 100, 000 — 500, 000 (resp. t ~ 50, 000 — 250, 000) have been examined to obtain the ISs at constant time intervals 
At = 5. 

coincide indicating that the protein is still confined around the NC, please remember that for the BF T g = 0.58 x Tg. 
As soon as T > 0.6 x Tg the maximum grows abruptly and reach the upper bound corresponding to (trans already at 
T ~ Tg, on the other hand the minimum value decreases indicating that at higher temperatures the protein can access 
basins of ISs with end-to-end distance lower than £o- This last result indicates that there is not a clear monotonic 
correspondence between the temperature increase and the achievable protein extensions. Moreover, the fact that the 
protein can easily attain also extremely stretched configurations at not too high temperatures suggests that in the 
case of the BF the protein can easily escape form the native valley and reach any part of the phase space, while for 
the GF the accessible IS configurations are much more limited at comparable temperatures. All this amounts to say 
that the end-to-end distance cannot be considered as a good reaction coordinate for the BF. 

IX. CONCLUDING REMARKS 

In conclusion, we can safely affirm that the reconstructions of the free energy landscape as a function of the end- 
to-end distance in terms of the ISs, obtained via out-of-equilibrium mechanical unfolding of the heteropolymers, are 
in very good agreement with the equilibrium weighted histogram estimate for the good and bad folder sequences at 
all the examined temperatures. In particular, this result indicates that the harmonic approximation employed to 
estimate the vibrational term (fT3| is quite good for temperatures in the range [Tf,Tg], as already pointed out in [ll| 
by considering the average potential energy. Moreover, the EJE reconstructions of the free energy profile compare 
quite well with the other two approaches for sufficiently low pulling velocities. For the good folder, the quality of the 
free energy landscape reconstruction via the extended Jarzinsky equality can be well appreciated by stressing that 
from pure structural information about the landscape a good estimate of dynamical quantities, like the unfolding 
times from the native configuration, can be obtained. 

Furthermore, for the good folder the information obtained by the equilibrium FEL both with the EJE and the IS 
methodologies can be usefully combined to give substantiated hints about the thermal unfolding. In particular the 
investigation of the ISs allows us to give an estimate of the (free) energetic and entropic barriers separating the native 
state from the completely stretched configuration. These barriers are associated to the structural transition induced 
by the protein manipulation and for the good folder they can put in direct relationship with the thermal transitions 
usually identified during folding/unfolding processes. 

On the other hand for the bad folder the end-to-end distance appears not to represent a good reaction coordinate, 
since mechanical and thermal unfolding seem to follow different paths. In other terms the unfolding process for the 
good folder consists of many small successive rearrangements of the NC, which are well captured by the distribution 
of the corresponding ISs on the landscape. While for the bad folder the thermal unfolding can involve also large 



conformational rearrangements, thus implying jumps from one valley to another of the landscape associated to large 
variations in the end-to-end distance, that cannot be well reproduces by the mechanical stretching of the heteropoly- 
mer. Future work on more realistic heteropolymer models is needed to clarify if the observed features, distinguishing 
good folders from bad folders, can be really considered as a specific trademark of proteins. 

A drawback of the EJE reconstruction is that extremely small velocities or an extremely large number of repetitions 
of the protocol are needed to achieve the collapse towards the equilibrium profile, thus rendering the implementation 
of the method quite time consuming. However, new optimized methods to obtain the asymptotic FEL, by combining 
the Jarzinsky equality with the Crooks' path ensemble average theorem, have been recently published pA [58j and it 
will be definitely worth to test their performances in the next future with respect to complex landscapes, like those 



As a final point, we would like to remember that, in the context of glassy systems, the concept of ISs has been 
critically compared to that of pure states [601 ] . the latter being local minima of the free energy landscape, while the 
ISs are minima of the potential energy, as discussed above. The relevance of the pure states for protein folding has 
been recently stressed in Ref. [611 ] . where it has been shown for a fibronectin domain that pure states can be put in 
direct correspondence with unfolding intermediates observable during mechanical pulling. However, in the present 
paper we have been only interested in how the FEL, which is the only thermodynamical relevant function, together 
with the corresponding pure states, can be obtained by employing a suitably chosen ensemble of ISs. 
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